Search CORE

12 research outputs found

Energy-efficient and high-performance lock speculation hardware for embedded multicore systems

Author: Bahar R Iris
Capodanno Giuseppe
Herlihy Maurice
Moreshet Tali
Papagiannopoulou Dimitra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2015
Field of study

Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.This work is supported in part by NSF under Grants CCF-0903384, CCF-0903295, CNS-1319495, and CNS-1319095 as well the Semiconductor Research Corporation under grant number 1983.001. (CCF-0903384 - NSF; CCF-0903295 - NSF; CNS-1319495 - NSF; CNS-1319095 - NSF; 1983.001 - Semiconductor Research Corporation

Boston University Institutional Repository (OpenBU)

Evaluating critical bits in arithmetic operations due to timing violations

Author: Bahar R. Iris
Moreshet Tali
Papagiannopoulou Dimitra
Rachford Tymani
Whang Sungseob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2017
Field of study

Various error models are being used in simulation of voltage-scaled arithmetic units to examine application-level tolerance of timing violations. The selection of an error model needs further consideration, as differences in error models drastically affect the performance of the application. Specifically, floating point arithmetic units (FPUs) have architectural characteristics that characterize its behavior. We examine the architecture of FPUs and design a new error model, which we call Critical Bit. We run selected benchmark applications with Critical Bit and other widely used error injection models to demonstrate the differences

Crossref

Boston University Institutional Repository (OpenBU)

Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures

Author: Andrea Marongiu
C Ferri
Dimitra Papagiannopoulou
Dimitra Papagiannopoulou
Luca Benini
Maurice Herlihy
Q Meunier
R. Iris Bahar
Tali Moreshet
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Edge-TM: Exploiting transactional memory for error tolerance and energy efficiency

Author: Bahar R. Iris
Herlihy Maurice
Marongiu Andrea
Moreshet Tali
Papagiannopoulou Dimitra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Scaling of semiconductor devices has enabled higher levels of integration and performance improvements at the price of making devices more susceptible to the effects of static and dynamic variability. Adding safety margins (guardbands) on the operating frequency or supply voltage prevents timing errors, but has a negative impact on performance and energy consumption. We propose Edge-TM, an adaptive hardware/software error management policy that (i) optimistically scales the voltage beyond the edge of safe operation for better energy savings and (ii) works in combination with a Hardware Transactional Memory (HTM)-based error recovery mechanism. The policy applies dynamic voltage scaling (DVS) (while keeping frequency fixed) based on the feedback provided by HTM, which makes it simple and generally applicable. Experiments on an embedded platform show our technique capable of 57% energy improvement compared to using voltage guardbands and an extra 21-24% improvement over existing state-of-the-art error tolerance solutions, at a nominal area and time overhead

Boston University Institutional Repository (OpenBU)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Playing with fire: Transactional memory revisited for error-resilient and energy-efficient MPSOC execution

Author: Bahar Iris
Benini Luca
Herlihy Maurice
Marongiu Andrea
Moreshet Tali
Papagiannopoulou Dimitra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

As silicon integration technology pushes toward atomic dimensions, errors due to static and dynamic variability are an increasing concern. To avoid such errors, designers often turn to "guardband" restrictions on the operating frequency and voltage. If guardbands are too conservative, they limit performance and waste energy, but less conservative guardbands risk moving the system closer to its Critical Operating Point (COP), a frequency-voltage pair that, if surpassed, causes massive instruction failures. In this paper, we propose a novel scheme that allows to dynamically adjust to an evolving COP and operate at highly reduced margins, while guaranteeing forward progress. Specifically, our scheme dynamically monitors the platform and adaptively adjusts to the COP among multiple cores, using lightweight checkpointing and roll-back mechanisms adopted from Hardware Transactional Memory (HTM) for error recovery. Experiments demonstrate that our technique is particularly effective in saving energy while also offering safe execution guarantees. To the best of our knowledge, this work is the first to describe a full-fledged HTM implementation for errorresilient and energy-efficient MPSoC execution

Boston University Institutional Repository (OpenBU)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Thrifty-malloc : un gestionnaire dynamique de mémoire pour systèmes embarqués multicoeurs avec mémoire transactionnelle matérielle

Author: Bahar Iris
Carle Thomas
Herlihy Maurice
Marongiu Andrea
Moreshet Tali
Papagiannopoulou Dimitra
Publication venue: HAL CCSD
Publication date: 01/01/2017
Field of study

International audienceCet article présente thrifty-malloc : un gestionnaire de mémoire dynamique compatible avec la mémoire transactionnelle matérielle, pour les systèmes embarqués multi-coeurs. Ce gestion-naire combine modularité, facilité d’utilisation et compatibilité avec la mémoire transaction-nelle matérielle (HTM) dans un design léger et peu gourmand en mémoire. Thrifty-malloc est facile à déployer et à conﬁgurer pour des programmeurs non-experts. Il délivre de bonnes per-formances avec un faible surcoût en mémoire, pour des applications embarquées exhibant un niveau élevé de parallélisme exécutées sur des architectures many-coeurs. De plus, les méca-nismes transparents qui permettent d’augmenter la résilience de ce gestionnaire aux situations dynamiques imprédictibles n’induisent qu’un faible surcoût temporel

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Speculative synchronization for coherence-free embedded NUMA architectures

Author: Benini Luca
Herlihy Maurice
Iris Bahar R.
Marongiu Andrea
Moreshet Tali
Papagiannopoulou Dimitra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access (NUMA) costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories, rather than hardware managed caches that require some form of cache coherence management. These \u201ccoherence-free\u201d systems still require some means to synchronize memory accesses and guarantee memory consistency. Conventional lock-based approaches may be employed to accomplish the synchronization, but may lead to both useability and performance issues. Instead, speculative synchronization, such as hardware transactional memory, may be a more attractive approach. However, hardware speculative techniques traditionally rely on the underlying cache-coherence protocol to synchronize memory accesses among the cores. The lack of a cache-coherence protocol adds new challenges in the design of hardware speculative support. In this paper, we present a new scheme for hardware transactional memory support within a cluster-based NUMA system that lacks an underlying cache-coherence protocol. To the best of our knowledge, this is the first design for speculative synchronization for this type of architecture. Through a set of benchmark experiments using our simulation platform, we show that our design can achieve significant performance improvements over traditional lock-based schemes

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures

Author: Bahar R. Iris
Benini Luca
Herlihy Maurice
Marongiu Andrea
Moreshet Tali
Papagiannopoulou Dimitra
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories, rather than hardware managed caches that require some form of cache coherence management. These \u201ccoherence-free\u201d systems still require some means to synchronize memory accesses and guarantee memory consistency. Conventional lock-based approaches may be employed to accomplish the synchronization, but may lead to both usability and performance issues. Instead, speculative synchronization, such as hardware transactional memory, may be a more attractive approach. However, hardware speculative techniques traditionally rely on the underlying cache-coherence protocol to synchronize memory accesses among the cores. The lack of a cache-coherence protocol adds new challenges in the design of hardware speculative support. In this article, we present a new scheme for hardware transactional memory (HTM) support within a cluster-based, many-core embedded system that lacks an underlying cache-coherence protocol. We propose two alternative data versioning implementations for the HTM support, Full-Mirroring and Distributed Logging and we conduct a performance comparison between them. To the best of our knowledge, these are the first designs for speculative synchronization for this type of architecture. Through a set of benchmark experiments using our simulation platform, we show that our designs can achieve significant performance improvements over traditional lock-based schemes

Boston University Institutional Repository (OpenBU)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Thrifty-malloc: A HW/SW codesign for the dynamic management of hardware transactional memory in embedded multicore systems

Author: Bahar R. Iris
Carle Thomas
Herlihy Maurice
Marongiu Andrea
Moreshet Tali
Papagiannopoulou Dimitra
Publication venue
Publication date: 01/01/2016
Field of study

We present thrifty-malloc: a transaction-friendly dynamic memory manager for high-end embedded multicore systems. The manager combines modularity, ease-of-use and hardware transactional memory (HTM) compatibility in a light-weight and memory-efficient design. Thrifty-malloc is easy to deploy and configure for non-expert programmers, yet provides good performance with low memory overhead for highly-parallel embedded applications running on massively parallel processor arrays (MPPAs) or many-core architectures. In addition, the transparent mechanisms that increase our manager's resilience to unpredictable dynamic situations incur a low timing overhead in comparison to established techniques

Boston University Institutional Repository (OpenBU)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Transparent and energy-efficient speculation on NUMA architectures for embedded MPSoCsProceedings of the First International Workshop on Many-core Embedded Systems - MES '13

Author: Andrea Marongiu
Dimitra Papagiannopoulou
Luca Benini
Maurice Herlihy
R. Iris Bahar
Tali Moreshet
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

High-end embedded systems such as smart phones, game consoles, GPS-enabled automotive systems, and home entertainment centers, are becoming ubiquitous. Like their general-purpose counterparts, and for many of the same energy-related reasons, embedded sys- tems are turning to multicore architectures. Moreover, as the de- mand for more compute-intensive capabilities for embedded sys- tems increases, these multicore architectures will evolve into many- core systems for improved performance or performance/area/Watt. These systems are often organized as cluster based Non-Uniform Memory Access (NUMA) architectures that provide the program- mer with a shared-memory abstraction, with the cost of sharing memory (in terms of performance, energy, and complexity) varying substantially depending on the locations of the communicating pro- cesses. This paper investigates one of the principal challenges pre- sented by these emerging NUMA architectures for embedded sys- tems: providing efficient, energy-effective and convenient mecha- nisms for synchronization and communication. In this paper, we propose an initial solution based on hardware support for specula- tive synchronization

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia